Machine learning-based analysis of overall stability constants of metal–ligand complexes

The stability constants of metal(M)-ligand(L) complexes are industrially important because they affect the quality of the plating film and the efficiency of metal separation. Thus, it is desirable to develop an effective screening method for promising ligands. Although there have been several machine-learning approaches for predicting stability constants, most of them focus only on the first overall stability constant of M-L complexes, and the variety of cations is also limited to less than 20. In this study, two Gaussian process regression models are developed to predict the first overall stability constant and the n-th (n > 1) overall stability constants. Furthermore, the feature relevance is quantitatively evaluated via sensitivity analysis. As a result, the electronegativities of both metal and ligand are found to be the most important factor for predicting the first overall stability constant. Interestingly, the predicted value of the first overall stability constant shows the highest correlation with the n-th overall stability constant of the corresponding M-L pair. Finally, the number of features is optimized using validation data where the ligands are not included in the training data, which indicates high generalizability. This study provides valuable insights and may help accelerate molecular screening and design for various applications.

Furthermore, β n intrinsically depends not only on the constituent elements of the ligand but also on its molecular structure. Considering an enormous number of M-L combinations in the chemical space, it is impractical to perform measurements of the overall stability constants for all candidates to find promising ligands. Therefore, there has been a great need for efficient methods predicting stability constants of arbitrary M-L pairs to accelerate either the design or screening of ligands for specific metals.
Over the past decades, machine learning approaches have been employed to predict various properties of M-L complexes, such as the spin-state splitting 6 and the volcano plot 7 . In general, there are two ways of predicting the properties of M-L complexes by machine-learning techniques: using the features calculated from the M-L complex itself, which are usually derived from the first principles calculation, or using the features calculated (3) β n = log K 1 × · · · × K n = log [M -L n] [M][L] n .
from M and L. Because it is not obvious what three-dimensional molecular structure the M-L complex will form in an aqueous solution, most of the machine-learning studies aiming to predict overall stability constants were developed by compositional and/or topological features of metals and ligands [8][9][10][11][12][13][14][15][16][17][18] . Here, details of previous works, which are also issues to be resolved in this study, are summarized. First, the variety of cations needs to be expanded because most of the previous reports covered a limited set of less than 20 metals. Second, a machinelearning model that predicts multi-order β n needs to be developed because previous studies focused mainly on β 1 . Third, the regression models in the previous works cannot conduct Bayesian optimization, which is a powerful technique to find the optimum candidate 19,20 . Since the Bayesian optimization requires both the predicted value and predicted variance to choose the promising condition, Gaussian process regression (GPR) is the most suitable. GPR is one of the nonlinear and nonparametric regression algorithms and has been used to derive not only material and molecular properties but also force fields for molecular dynamics simulation 19 . To date, there is no report on developing GPR models for predicting stability constants. Forth, the interpretability of the machinelearning model needs to be improved. If we evaluate the relevance of both cation and ligand properties on overall stability constants, the results can be compared with physical understanding. Although Chaube et al. reported the feature importance of both cations and ligands through the analysis of their machine-learning models, such as random forest feature importance and permutation importance, none of the cation features were even in the top 10, despite β n being determined by the interaction between cation and ligand 8 . Moreover, to our knowledge, it remains unclear what kinds of properties are critical for multi-order β n . Thus, quantitatively predicting the overall stability constants of arbitrary M-L pairs in the diverse chemical space remains a challenge.
In this work, we overcome the above four obstacles. We collected experimental results for overall stability constants from existing publications to prepare an extremely large training dataset containing 19,810 data points. This original dataset is composed of two sub-datasets: one has 13,559 data points for β 1 of 57 cations and the other one has 6251 data points for multi-order β n (n = 2-6) of 50 cations. Using compositional and topological features of both cations and ligands as the descriptor, we trained a GPR model for predicting β 1 . Subsequently, we developed another GPR model for predicting multi-order β n by employing the predicted β 1 values of the corresponding M-L pairs as one of the features. To improve the interpretability of our models, we performed a sensitivity analysis. Consequently, it was found that electrical features, such as electronegativity and ionic properties, of both cations and ligands are the most important for predicting β 1 . Furthermore, the predicted β 1 value was found to have the strongest relevance to predicting multi-order β n of the corresponding M-L pair. Note that these results are consistent with the physical understanding of the complex formation. Finally, the GPR models exhibited high generalizability for ligands for which data were not contained in the training datasets and those located near the edge of the applicability domain. Our machine-learning modeling and analysis provide novel insights for complex formation and are expected to provide a pathway to accelerating molecular design and screening for various applications.

Results
Visualization of the initial dataset. Details on how the initial dataset was prepared are described in the Methods section. As one of the descriptions for the chemical space, Fig. S1 shows the distribution of the molecular weights of the ligands. To our knowledge, there is no previous study on the prediction of overall stability constants using such a large dataset (19,810 data points containing 57 cations). Due to the increased number of data points and cation species, the generalizability of the machine-learning model is expected to improve. Figure 1a summarizes the total number of entries for each cation. Note that our dataset encompasses diverse metals, including alkali metals, alkaline-earth metals, noble metals, transition metals, and rare-earth metals. One can see that there is a large amount of data for Cu 2+ , Ni 2+ , Zn 2+ , Co 2+ , Cd 2+ , Ag + , and Ca 2+ , accounting for 50% of the total data. Figure 1b shows the distribution and total numbers of data, cations, and ligands for each β n . As shown in Fig. 1b, although there are a lot of experimental results up to β 4 , the amount for β 5 and β 6 is quite small. In this study, due to this limitation on the data for β 5 and β 6 , we created two machine-learning models: a model for predicting the first overall stability constant β 1 and a model for predicting multi-order β n (n = 2-6) using appropriate descriptors (see Methods section).
Sensitivity analysis and optimization of the GPR model for predicting β 1 . We prepared a total of 118 features to create a GPR model for predicting β 1 in this study (see the Methods section). Feature selection is critically important for creating a machine-learning model with high predictive performance. In GPR, although the relevance of each feature is usually interpreted as the inverse of its length scale parameter, some previous reports have pointed out that this approach sometimes does not work well [21][22][23] . Accordingly, we evaluated the relevance of each feature via sensitivity analysis using a Kullback-Leibler (KL) divergence as a measure 23 . We set the perturbation to 0.001 during calculation. Figure 2a shows the standardized relevance of the 10 highestranked features using the GPR model with optimized hyperparameters that uses full feature β 1 (all results are listed in Supplementary Information S2). The total contribution of these 10 features reaches 0.755. As shown in Fig. 2a, the Pauling electronegativity of metals is the most relevant feature for predicting β 1 . Moreover, ionic properties, such as molecular charge, cation charge, and ionic radius, are also highly relevant. Among the ligand features, Moreau-Broto autocorrelation of topological structure features (AATS0Z, AATS0i, and ATSC3se) and fragmental features (NssO and NssNH) are in the top 10 features. AATS0Z, AATS0i, and ATSC3se are computed based on a molecular graph and depend on atomic number, ionization potential, and the Sanderson electronegativity of the elements in the ligand, respectively. NssO and NssNH correspond to the number of chemical structures, such as -O-and -NH-, respectively. In particular, oxygen and nitrogen become coordination sites due to their high electronegativity, suggesting that the relevance scores of NssO and NssNH are high. www.nature.com/scientificreports/ Next, we performed feature optimization of the β 1 GPR model while monitoring the predictive performance. Note that usual cross-validation techniques do not reproduce the original purpose of predicting unknown ligands because it is unavoidable for common ligands to remain in both training and validation data, which may result in an overestimation of the predictive performance. Thus, we extracted 20 appropriate ligands based on the applicability domain of our model and calculated mean absolute error (MAE) and coefficient of determination (R 2 ) for them. The selection rule for the validation samples is described in Supplementary Information S3, and we would like to emphasize that the 20 selected ligands are not contained in the training dataset. Figure 2b summarizes the predictive performance for the validation data using the GPR model as a function of the descriptor dimension. The features were arranged in descending order of relevance scores, as shown in Fig. 2a. Consequently, Sensitivity analysis and optimization of GPR model for predicting multi-order β n . As demonstrated in the prediction of β 1 , the feature selection is critical in predicting multi-order overall stability constants β n as well. For Co 2+ , Ni 2+ , and Cu 2+ in particular, it has been reported that there are linear correlations between β 1 and β 2 16 . In the present study, we demonstrate that the strong correlations between β 1 and β n are observed not only in other cations but also in higher coordination numbers. Figure 3 summarizes the relationship between experimental multi-order overall stability constants β n and predicted β 1 values of the corresponding M-L pair. Note that not all M-L pairs for β n are contained in the dataset for β 1 . As shown in Fig. 3, one can see a strong correlation between each of the true β n and predicted β 1 values, resulting in large positive Pearson correlation coefficients (PCC). Therefore, the predicted β 1 for the M-L n complex is expected to be a significantly effective feature for predicting multi-order β n . Because we have succeeded in predicting β 1 by combining features of cations and ligands, it is thought to be feasible to predict multi-order β n by using features of M-L complex and L. Consequently, we prepared a total of 60 features to create a GPR model for predicting multi-order β n in this study (see the Methods section).
Similar to the β 1 GPR model, Fig. 4a shows the standardized relevance of the top 10 highest-ranked features using the full-feature-used β n GPR model with optimized hyperparameters (the full result is provided in www.nature.com/scientificreports/ Supplementary Information S4). The total contribution of these features reaches 0.986. As shown in Fig. 4a, it is obvious that the predicted β 1 for the M-L pair is the most important feature. NaaO and nBridgehead are fragmental features, which are defined as the number of chemical structures like -O-among aromatic rings and the number of bridgehead atoms, respectively. The X_VSAY, such as SlogP_VSA4, PEOE_VSA13, and EState_VSA2, is defined as the sum of van der Waals surface area (VSA) of atoms whose property X lies in the range Y. In particular, PEOE_VSA13 and EState_VSA2 are related to the 3-dimensional distribution of electrons and are calculated using the partial equalization of orbital electronegativities (PEOE) method 24 and electrotopological state index (EState) method 25 , respectively. Moreover, JGI2 is also a topological feature, which is computed by a 2-ordered mean topological charge. After optimizing the number of features (see Supplementary Information  www.nature.com/scientificreports/ S3), the best predictive performances (MAE: 1.30, R 2 : 0.92) were obtained with the top 25 features, which are comparable to the predictive performance of the best β 1 model. Figure 4b shows the parity plot between true and predicted multi-order β n values of the validation samples using the best GPR model, indicating the high generalizability of our model again.

Discussion
In this section, we discuss the important features for predicting β 1 and multi-order β n . As a summary of the results obtained from the sensitivity analysis of the GPR model for predicting β 1 , electronegativity-or ionic-related features are sensitive to β 1 . In principle, when the electron polarization between the cation and the element at the coordination site of the ligand is small, a strong coordination bond is formed between them 2 . The electron distribution between them is then determined not only by the difference in electronegativities but also by the size of the cation. For β 1 , the Coulomb interaction between the cation and negatively charged ligand assists the formation of stable M-L complexes. Therefore, as shown in Fig. 2a, it is quite reasonable that features relevant to the electronegativity and ionic properties of both metals and ligands exhibited high relevance scores for predicting β 1 .
In addition, we believe that these results were successfully obtained thanks to using experimental data for various cations. Given that the electronegativities of lanthanides are very similar, we recognize that their importance was underestimated in Chaube et al. 8 . However, because PEOE_VSA2, which was the most important feature in their study, is also related to electronegativity 24 , our results do not deviate from the findings of the previous studies. Next, we focus on the relationship between multi-order β n and β 1 . Because the n-th equilibrium constant K n satisfies the relationship of K 1 > K 2 > · · · > K n , one can derive the following universal inequality: implies that the ratio β n /β 1 is always larger than 1 and β n-1 /β 1 is smaller than n, which is observed in Fig. 3, with a few exceptions. Considering β 1 reflects the cation-ligand binding strength to some degree, this suggests that a strong correlation between β n and β 1 is one of the intrinsic properties in the formation of complexes. In addition, the fact that VSA-related features are important for the multi-order β n model is presumably because 3-dimensional structures such as steric hindrance are more influential than in the case of forming M-L complexes. Finally, we would like to mention the relationship between β i and β j (i, j > 1). As shown in Fig. 3, the multi-order stability constants have a linear dependence on β 1 , which might also mean the linear relationship between β i and β j . We believe that these empirical trends can be useful to roughly predict stability constants for M-L n complexes, which became soluble only when multiple ligands are coordinated.

Conclusion
In this study, we developed two machine-learning models: one for predicting the first overall stability constant β 1 and the other for predicting the multi-order overall stability constant β n . Using a very large training dataset, the developed models covered more than 50 cations, realizing the high generalizability of our models. Note that this is the first time a machine-learning model was created to predict the multi-order overall stability constant. Moreover, the relevance scores of features for both cations and ligands are quantitatively evaluated through sensitivity analysis to improve the interpretability of our models. Consequently, the most relevant features are consistent with physical understanding for complex formation. We believe that our findings are useful for the design and screening of new ligands for various applications. In particular, because it was concluded that the predicted β 1 value was the most important property to predict multi-order β n of the corresponding M-L pair, further development of the β 1 model is expected to be necessary in the future. Finally, we would like to mention the advantages and disadvantages of our GPR models. One of the advantages is efficient searching for new ligands through Bayesian optimization, which is a topic we will study in the future. This is due to the fact that prediction uncertainty is quantified by GPR model. However, our models still cannot be applied to some cations, such as NH 4 + and UO 2 + because we focused on only single cations in this study. The descriptor for these cations may be prepared by averaging features of elements in them. By solving these remaining issues, we expect to realize a machine-learning model for predicting arbitrary complexes.

Methods
Dataset preparation. The experimental values of the n-th overall stability constants β n for the M:L = 1:n complexes (n = 1-6) and experimental conditions were collected from the NIST Critically Selected Stability Constants of Metal Complexes Database 26 and various literature  . In this study, data for several heavy metals (i.e., Am, Cm, Cf, Bk, Es, Fm, and Md) or whose ligands contain elements such as Te, Se, As, Mn, Co, Fe, W, Mo, Cr, and Re were excluded due to the difficulties in making descriptors. Moreover, we collected experimental data according to the following priorities: data with temperature of 25 °C and ionic strength of 0.1 > data with temperature of 25 °C and any ionic strength > data with any temperature and ionic strength of 0.1 > data with the maximum overall stability constant. In the case of duplicates, the data with the largest overall stability constant was employed. Consequently, 19,810 M-L n complexes remained, which consisted of 57 cations and 2706 ligands. The chemical structure of ligands is represented by SMILES (Simplified Molecular Input Line Entry System).
Feature engineering. Following the previous study 8 , we used cation properties, ligand compositional and topological features, and experimental conditions, namely temperature and ionic strength, as the machinelearning descriptors for predicting β 1 . For cation descriptors, we initially selected 12 element-level features, such as cation charge, atomic number, melting point, molar specific heat capacity, ionic radius, polarizability, electron affinity, Pauling electronegativity, and numbers of unfilled electrons in s, p, d, and f orbitals 59,60 . We used molecu-(4) β n−1 < β n < nβ 1 . www.nature.com/scientificreports/ lar descriptor calculation software Mordred to generate compositional and topological descriptors for ligands 61 .
In addition, we prepared the molecular charge of ligands in an aqueous solution as one of the ligand features. After removing features that have only a single value or null value, 587 ligand features remained. Subsequently, we calculated the Pearson correlation coefficient of the pair of ligand features i and j, Corr(i, j), and if the absolute value of Corr(i, j) is greater than 0.7, we excluded the feature j. Furthermore, to avoid multicollinearity among features, we iteratively removed the feature with the largest variance inflation factor (VIF) score until the VIF score for all features became less than 4. In the case of predicting multi-order β n , we employed the predicted β 1 , the standard deviation of the predicted β 1 , and the charge of M-L complex, namely the sum of the cation charge and molecular charge, as the descriptor for M-L complex. The descriptor for ligands consisted of ligand features that were not used in the best β 1 GPR model and the number of ligands to be additionally coordinated to the M-L complex, namely n−1. After feature engineering, the shapes of the final datasets for predicting β 1 and multi-order β n were 13,559 data × 118 features and 6251 data × 60 features, respectively.
Gaussian process regression. In GPR, a similarity between data x i and x j is measured by the kernel, such as k(x i , x j ), which in turn defines a covariance matrix. Therefore, GPR is one of the powerful techniques because it naturally quantifies predicted values and their uncertainties. A well-known kernel choice is a Matérn kernel with ν = 3/2 62,63 , which is described as follows: where σ, l, and r are hyperparameters to represent the signal amplitude, length scale referring the relevance of features, and the Euclidean distance between data x i and x j . As shown in Eq. (5), the usual Matérn kernel with ν = 3/2 has a single length scale parameter l. However, in this study, considering that the relevance of each descriptor should be different, the Matérn kernel with ν = 3/2 is modified with the automatic relevance determination (ARD) structure as follows: where d is the dimension of a descriptor. Our GPR modeling was performed using PyTorch 64 and GPytorch 65 .

Data availability
The full results of feature relevance calculated by sensitivity analysis and the details of feature optimization for the GPR model to predict multi-ligand stability constants are provided in Supplementary Information. Additional information regarding this study is available from the corresponding authors upon reasonable request.